Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spreadinterponly (CPU and GPU) #602

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Spreadinterponly (CPU and GPU) #602

wants to merge 30 commits into from

Conversation

ahbarnett
Copy link
Collaborator

@ahbarnett ahbarnett commented Jan 8, 2025

This is a proposal for how a convenient access to the spread/interp task is possible via the FINUFFT/cuFINUFFT interface. (since spreadinterp module is not part of our API). It is based on @chaithyagr PR #599 for the new CPU opts field. And the GPU s/i-only is already part of 2.3.1 and master.

Design discussion:

I feel that the GPU spreadinterponly interface has to be tweaked so that upsampfac controls the kernel shape. This would also apply to CPU. Currently GPU forces upsampfac=1.0, but under the hood uses the kernel for the default upsampfac=2, which makes no sense.
Eg, I would like to be able to set upsampfac=infinity, which gives a certain kernel shape needed in Ewald methods.

I will hash this out here. Comments welcome.

Tasks:

  • clean up CPU spreadinterponly=1 logic
  • add the new opts field to all language interfaces and check
  • example which demos it (1D)
  • add tester to CI (1D)
  • docs CPU
  • MATLAB demo CPU
  • Revisit gpu_spreadinterponly=1 and edit logic so upsampfac is respected rather than forced to be 1.0
  • GPU doc copying CPU doc
  • remove error code 23, since when spreadinterponly=1 then any upsampfac in (1,Inf] is valid.

@ahbarnett
Copy link
Collaborator Author

@chaithyagr If you look at the example, tester, and docs/opts.rst you will see my proposal for the cpu spreadinterponly=1 implementation. See what you think. If you are not unhappy - let me know - I will apply this to GPU too - the difference being you would not have to set upsampfac=1 to use it, rather would set it as with a NUFFT. This is really the only way I can see it working, without breaking the API.

I still would be curious how you guys access the actual kernel function used, in your DCW for MRI application...

@ahbarnett
Copy link
Collaborator Author

@chaithyagr Still hoping for some feedback on GPU interface tweak, so I can go ahead (early February at this rate) with:

Revisit gpu_spreadinterponly=1 and edit logic so upsampfac is respected rather than forced to be 1.0.

@chaithyagr
Copy link
Contributor

Hey @ahbarnett , I am so sorry for my delay. I think I responded your question regarding ns going to infinity in #564 .
Coming to this update,
I am okay with this new interface, I will have to change some stuff in mind-inria/mri-nufft#195 and mind-inria/mri-nufft#224.
Just some quick questions to be sure I understood correcty:

  1. So right now if spreadinterp_only is set to 1, the output is still with upsampfac>1, basically we will need to send arrays of the right sizes on our own. This was not needed when we do NUFFT as the output shape is exactly the same size as what we need.

To be clearer:
If we are doing NUFFT with 1000 points trajectory to a shape of (256, 256) and upsampfac=2,
then the image size in forward NUFFT is (256, 256), but if spreadinterp_only is set, it is (512, 512), right?

Same applies for adjoint operation where we provide a memory location for output image dimensions.

  1. Do we have checks on input shapes of the images? I didnt find this originally and I just wanted to make sure that the sizes to API remain the same, which is why I chose the route of making upsampfac=1

Finally, please do let me know when you are ready and I would like to make sure my tests pass and I can make fixes right on my side.

Thank you so much for handeling this and please do let me know when you are ready. I will do my changes on finufft side at mri-nufft today and give you update on whether it works good.

@chaithyagr
Copy link
Contributor

Assuming that we need to provide (512, 512), note that current state of codes lead to following error:

if fshape[-1] != ms:
raise RuntimeError('FINUFFT f.shape is not consistent with n_modes')
if dim>1:
if fshape[-2] != mt:
raise RuntimeError('FINUFFT f.shape is not consistent with n_modes')
if dim>2:
if fshape[-3] != mu:
raise RuntimeError('FINUFFT f.shape is not consistent with n_modes')

Basically the validity checks are right, except when we are pushing out a larger array when we have spreadinterp_only is true.

@DiamonDinoia DiamonDinoia marked this pull request as draft January 24, 2025 18:04
@chaithyagr
Copy link
Contributor

@ahbarnett did you get a chance to look at my comments? The main issue of setting spreadinterp_only=1 and upsampfac!=1, implies we need to update the API interface such that it can support having image domain shapes = img_size * upsampfac.

@DiamonDinoia DiamonDinoia mentioned this pull request Jan 30, 2025
8 tasks
@ahbarnett
Copy link
Collaborator Author

@chaithyagr Sorry about the delay. But the answer to your question is no: the output array for a type 1 always has the requested size. Recall that when spreadinterponly=1, upsampfac only controls the kernel choice and width in gridpoints; there is no actual upsampling, just spreading to the user-requested grid. It wouldn't make sense to do much else. The spreading kernel must be controlled somehow, and this is the most elegant way to do it (via tol and upsampfac, as kernel control parameters only). Note that for type 2 it would also be meaningless to "upsample" the user's regular input array: again, plain interpolation from their grid is done. See examples/spreadinterponly1d.cpp

Let me know if unclear. I will finish up the PR now on the GPU side. Best, Alex

@ahbarnett ahbarnett marked this pull request as ready for review February 8, 2025 19:17
@ahbarnett ahbarnett assigned chaithyagr and unassigned chaithyagr Feb 8, 2025
@ahbarnett
Copy link
Collaborator Author

@chaithyagr I am done and would like to bring it in, if you can let me know. From your end (GPU user) there is only one change: upsampfac cannot be set to 1.0; it must be a valid setting in order to know what kernel shape to use. Recall these are 0.0 (auto-choose), one of the valid settings 1.25 or 2.0 (for fast kerevalmeth=0), or any number in (1,+Inf], for slower kerevalmeth=1. Recall that upsampfac is only a kernel shape control parameter, and does not scale the I/O f grid which is always the requested size (N1*N2 etc). It is amusing that upsampfac=Inf is a valid setting for kerevalmeth=1, even though it could never be used for a NUFFT. I have removed your error code 23, as there is no need. I improved the comments. I fixed the docs to match, and to improve the explanation for you about the precise grid that is spread/interp to.

I added a matlab CPU demo that includes plotting the spreading kernel:
demo_sionly2d
Here you get to see the output array size (500x1000) and that the origin precisely aligns with (250,500), ie (N1/2, N2/2).
This helps understand the overlay of the spreadinterp grid on the physical (for you k-space) coordinates x,y in [-pi,pi], etc.

Best, Alex

@chaithyagr
Copy link
Contributor

Hey @ahbarnett , thats awesome. Thank you for your updates. I will surely get back to you early next week!
Based on your response, I still dont think I understand what upsampfac does in your case..

My understanding based on how it used to work in gpuNUFFT is that :

  1. For forward NUFFT (image to kspace): we take image of size N to fourier domain of grid size of upsampfac*N after which you would interpolate the data to kspace locations

  2. For adjoint NUFFT (kspace to image): we spread the data to upsampfac*N grid followed by FFT and crop the data in image domain.

Am I missing something?

If above is true, wouldnt removing FFT imply exposing a larger grid image as input (for forward) and as output (for adjoint). RIght?

i.e. isnt the output the spread grid with a different size?

I will go through code to get more clarity. Btw, the CPU one works when we give the right shapes as expected, but I think the output was cropped. Hopwfully I can get some example images to give clarity in my message above

if (opts.spreadinterponly) { // (unusual case of no NUFFT, just report)

// spreadinterp grid will simply be the user's "mode" grid...
for (int idim = 0; idim < dim; ++idim) nfdim[idim] = mstu[idim];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaithyagr this is the CPU version where you see upsampling is switched off. Indeed, it needs to write to the user sizes of array, so it cannot be any other way.

@ahbarnett
Copy link
Collaborator Author

@chaithyagr Sorry, I forgot to set nf1=ms (ie, N1), and the equivalents in other dims. I fixed that now. The problem is we don't have a tester for your gpu_spreadinterponly=1 mode, so CI doesn't catch such a bug.

For CPU I know the behavior is correct. I didn't understand your comment about output being cropped: that is just for plotting purposes! The I/O grid sizes are (500,1000) and indeed the result matches that. Re code, see l. 665 of src/finufft_core.cpp. I can't make github make a link to that since it's in a PR. I tagged you in a comment.

But, in terms of behavior, I feel that the docs are clear: there is no upsampling, and the "upsampfac" merely controls the kernel parameters. I'm not sure what's unclear about that. Please let me know.

PS I have no idea what gpuNUFFT does! (recall that none of these other codes provide actual mathematical formulae for what they do... I could do that for spreadinterponly=1 mode, but I want to know it's used first....)

Thanks, Alex


if (d_plan->opts.gpu_spreadinterponly) {
// spread/interp grid is precisely the user "mode" sizes, no upsampling
nf1 = d_plan->ms;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaithyagr ...and here is the corresponding GPU code. At the risk of repetition, since the user allocates an N1*N2 output array, spreading could not write to any other size without segfault. Agreed?

@ahbarnett
Copy link
Collaborator Author

Indeed, I was able to hack my local test/cuda/cufinufft1d_test to set debug=1 and gpu_spreadinterponly=1,
giving the following.
This verifies that the nf1=N1, ie, user array sizes are respected, even though sigma=2 (upsampfac for kernel design):

[time  ] dummy warmup call to CUFFT	 0.00434 s
[cufinufft] (ms,mt,mu): 1000 1 1
[setup_spreader] (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[cufinufft] spreader options:
[cufinufft] nspread: 7
[cufinufft] bin size x: 1048
[cufinufft] shared memory required for the spreader: 16896
[cufinufft] spreadinterponly mode: (nf1,nf2,nf3) = (1000, 1, 1)
[time  ] cufinufft plan:		 0.00805 s
[cufinufft] plan->M=1000
[time  ] cufinufft setNUpts:		 0.000264 s
[time  ] cufinufft exec:		 0.00173 s
[time  ] cufinufft destroy:		 1.95e-05 s
[Method 1] 1000 U pts to 1000 NU pts in 0.0101 s:      9.94e+04 NU pts/s
					(exec-only thoughput: 5.78e+05 NU pts/s)
[gpu   ] one mode: rel err in F[370] is 5.62

Note the error is garbage, as it should be since it thinks it's a NUFFT :)

We need to add a GPU version of the basic SI-only math test that I created for the CPU. Anyway, I'll leave this now. Let me know if you are happy. Thanks, Alex

@chaithyagr
Copy link
Contributor

Hey @ahbarnett , I finally tested your codes on our plugin and it works great! Thank you so much for your updates and cleaning up of my hacky implementation. Please let us know if you need anything else from us. Currently we have some basic python tests in our repository. Sadly we dont have any C++ version of tests. I can help with it if needed, but I cant work on it urgently, please let us know.

Thank you again for this work!

@DiamonDinoia
Copy link
Collaborator

Hey @ahbarnett , I finally tested your codes on our plugin and it works great! Thank you so much for your updates and cleaning up of my hacky implementation. Please let us know if you need anything else from us. Currently we have some basic python tests in our repository. Sadly we dont have any C++ version of tests. I can help with it if needed, but I cant work on it urgently, please let us know.

Thank you again for this work!

Could you send a link to the tests?

@chaithyagr
Copy link
Contributor

The ongoing PR is at mind-inria/mri-nufft#224

In particular, you can see https://github.com/chaithyagr/mri-nufft/blob/7187491268f19ded65627cbed8d0a4a7acfbb9b6/tests/operators/test_density_for_op.py#L28-L52

This test checks that the density compensation weights is approximately the inverse of the radial distance for radial trajectories. Its not a very specific mathematical test, but when the trajectory is Nyquist, we expect the density to match what is theoretical. However, for MR recon approximations are okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants